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Abstract 

We propose an efficient framework for enabling secure 
multi-party numerical computations in a Peer-to-Peer net- 
work. This problem arises in a range of applications such as 
collaborative filtering, distributed computation of trust and 
reputation, monitoring and numerous other tasks, where the 
computing nodes would like to preserve the privacy of their 
inputs while performing a joint computation of a certain 
• function. 

Although there is a rich literature in the field of dis- 
tributed systems security concerning secure multi-party 
computation, in practice it is hard to deploy those meth- 
ods in very large scale Peer-to-Peer networks. In this work, 
we examine several possible approaches and discuss their 
feasibility. Among the possible approaches, we identify a 
single approach which is both scalable and theoretically se- 
cure. 

An additional novel contribution is that we show how to 
compute the neighborhood based collaborative filtering, a 
state-of-the-art collaborative filtering algorithm, winner of 
the Netflix progress prize of the year 2007. Our solution 
computes this algorithm in a Peer-to-Peer network, using a 
privacy preserving computation, without loss of accuracy. 

Using extensive large scale simulations on top of real 
Internet topologies, we demonstrate the applicability of our 
approach. As far as we know, we are the first to implement 
such a large scale secure multi-party simulation of networks 
of millions of nodes and hundreds of millions of edges. 
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1 Introduction 

We consider the problem of performing a joint numeri- 
cal computation of some function over a Peer-to-Peer net- 
work. Such problems arise in many applications, for ex- 
ample, when computing distributively trust |[T8l . ranking 
of nodes and data items |10|, clustering J5), collaborative 
filtering |6j |27], factor analysis lfl2l etc. The aim of se- 
cure multi-party computation is to enable parties to carry 
out such distributed computing tasks in a secure manner. 
Whereas distributed computing classically deals with ques- 
tions of computing under the threat of machine crashes and 
other inadvertent faults, secure multi-party computation is 
concerned with the possibility of deliberate malicious be- 
havior by some adversarial entity. That is, it is assumed that 
a protocol execution may come under attack by an exter- 
nal entity, or even by a subset of the participating parties. 
The aim of this attack may be to learn private information 
or cause the result of the computation to be incorrect. Thus, 
two central requirements on any secure computation proto- 
col are privacy and correctness. The privacy requirement 
states that nothing should be learned beyond what is ab- 
solutely necessary; more exactly, parties should learn their 
designated output and nothing else. The correctness re- 
quirement states that each party should receive its correct 
output. Therefore, the adversary must not be able to cause 
the result of the computation to deviate from the function 
that the parties had set out to compute. 

In this paper, we consider only functions which are built 
using the algebraic primitives of addition, substraction and 
multiplication. In particular, we focus on numerical meth- 
ods which are computed distributively in a Peer-to-Peer net- 
work, where in each iteration, every node interacts with 
a subset of its neighbors by sending scalar messages, and 



computing a weighted sum of the messages that it receives. 
Examples of such functions are belief propagation [ 22 1 , EM 
(expectation maximization) fPHl . Power method [18], sepa- 
rable functions [20|, gradient descent methods ||25ll and lin- 
ear iterative algorithms for solving systems of linear equa- 
tions [9]. As a specific example, we describe the Jacobi 
algorithm in detail in Section l5TI 

There is a rich body of research on secure computation, 
starting with the seminal work of Yao [26|. Part of this re- 
search is concerned with the design of generic secure pro- 
tocols that can be used for computing any function (for ex- 
ample, Yao's work [26] for the case of two participants, and 
e.g. (8] [16) for solutions for the case of multiple partici- 
pants). There are several works concerning the implemen- 
tation of generic protocols for secure computation. For ex- 
ample, FairPlay |fl9l is a system for secure two-party com- 
putation, and FairPlayMP [ 7 ] is a different system for secure 
computation by more than two parties. These two systems 
are based (like Yao's protocol) on reducing any function 
to a representation as a Boolean circuit and computing the 
resulting Boolean circuit securely. Our approach is much 
more efficient, at the cost of supporting only a subset of the 
functions the FairPlay system can compute. 

A different line of work studies secure protocols for com- 
puting specific functions (rather than generic protocols for 
computing any function). Of particular interest for us are 
works that add a privacy preserving layer to the computa- 
tion of functions such as the factor analysis learning prob- 
lem (for which [ 12] describes a secure multi-party protocol 
using homomorphic encryption), computing trust in a Peer- 
to-Peer network (for which lPT8l suggests a solution using 
a trusted third party), or the work of 11251 , which is closely 
related to our work, but is limited to two parties. 

Most previous solutions for secure multi-party compu- 
tation suffer from one of the following drawbacks: (1) 
they provide a centralized solution where all information 
is shipped to a single computing node, and/or (2) require 
communication between all participants in the protocol, 
and/or (3) require the use of asymmetric encryption, which 
is costly. In this work, we investigate secure computation in 
a Peer-to-Peer setting, where each node is only connected to 
some of the other nodes (its neighbors). We examine differ- 
ent possible approaches, and out of the different approaches 
we identify a single approach, which is theoretically secure, 
efficient, and scalable. 

Security is often based on the assumption that there is 
an upper bound on the global number of malicious partici- 
pants. In our setting, we consider the number of malicious 
nodes in each local vicinity. Furthermore, most of the ex- 
isting algorithms scale to tens or hundreds of nodes at the 
most. In this work, we address the problem in a setting of a 
large Peer-to-Peer network, with millions of nodes and hun- 
dreds of millions of communication links. Unlike most of 



the previous work, we have performed a very large scale 
simulation, using real Internet topologies, to show our ap- 
proach is applicable to real network settings. 

As an example for applications of our framework, we 
take the neighborhood based collaborative filtering [6]. This 
algorithm is a recent state-of-the-art algorithm. There are 
two challenges in adapting this algorithm to a Peer-to-Peer 
network. First, the algorithm is centralized and we propose 
a method to distribute it. Second, we add a privacy pre- 
serving layer, so no information about personal ranking is 
revealed during the process of computation. 

The paper is organized as follows. In Section [2] we for- 
mulate our problem model. In Section [3] we give a brief 
background of cryptographic primitives that are used in our 
schemes. Section [4] outlines our novel construction. We 
give a detailed case study of collaborative filtering as an ex- 
ample application in Section[5] Large scale simulations are 
presented in Section|6] We conclude in Section|7] 

We use the following notations: T stands for a vector or 
matrix transpose, the symbols and denote entries 
of a vector and matrix, respectively. The spectral radius 
p(B) = maxi<i< s (|Ai|), where Ai, . . . X s are the eigenval- 
ues of a matrix B. iVj is the set of neighboring nodes to 
node i. 

2 Our Model 

Given a Peer-to-Peer network graph G = (V,E) with 
\V\ = n nodes and \E\ = e edges, we would like to per- 
form a joint iterative computation. Each node i starts with 
a scalar stat^H £ M, and on each round, sends messages 
to a subset of its neighbors. We denote a message sent from 
node i to node j at round r as m\ j . 

Let N, denote the set of neighboring nodes of i. De- 
note the neighbors of node i as n^n^, . . . ,i%i k , where 
k = \Ni\. We assume, wlog, that each node sends a mes- 
sage to each of its neighbors. On each round r = 1, 2, ■ ■ • , 
node i computes, based on the messages it received, a func- 
tion / : R k+1 -> R k+1 , 

i r r r \ ft r— 1 ) — 1 r— 1 \ 

\ x i ; m i,n il ? ' ' ' j m i,n ik I ~ J \ x i ' m n il ,»>'*" ' ,i> 

Namely, the function gets as input the initial state and all the 
received neighbor messages of this round and outputs a new 
state and messages to be sent to a subset of the neighbors 
at the next round. The iterative algorithms are run either 
a predetermined number of rounds, or until convergence is 
detected locally. 

In this paper, we are only interested in functions / which 
compute weighted sums on each iteration. Next we show 
that there is a variety of such numerical methods. Our goal 

1 An extension to the vector case is immediate, we omit it for the clarify 
of description. 
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is to add a privacy preserving layer to the distributed com- 
putation, such that the only information learned by a node 
is its share of the output. 

We use the semi-honest adversaries model: in this model 
(common in cryptographic research of secure computation) 
even corrupted parties are assumed to correctly follow the 
protocol specification. However, the adversary obtains the 
internal states of all the corrupted parties (including the 
transcript of all the messages received), and attempts to use 
this information to learn information that should remain pri- 
vate]^ In Section|7]discuss the possibility for extending our 
construction to the "malicious adversary", which can be- 
have arbitrarily. 

We define a configurable local system parameter d,, 
which defines the maximum number of nodes in the local 
vicinity of node i (direct neighbors of node i) which are cor- 
rupt. Whenever this assertion is violated, the security of our 
proposed scheme is affected. This is a stronger requirement 
from our system, relative to the traditional global bound on 
the number of adversarial nodes. 

3 Cryptographic primitives 

We compare several existing approaches from the liter- 
ature of secure multi-party computation and discuss their 
relevance to Peer-to-Peer networks. 

3.1 Random perturbations 

The random additive perturbation method attempts to 
preserve the privacy of the data by modifying values of the 
sensitive attributes using a randomized process (see fl4j Q~3] 
[141 ). In this approach, the node sends a value U{ + v, where 
m is the original scalar message, and v is a random value 
drawn from a certain distribution V. In order to perturb 
the data, n independent samples Vx, i>2, • • ■ , v n , are drawn 
from a distribution V. The owners of the data provide the 
perturbed values u% + Vx, 112 + V2, • • ■ ,u n + v n and the cu- 
mulative distribution function FV(r) of V. The goal is to 
use these values, instead of the original ones, in the com- 
putation. (It is easy to see, for example, that if the expected 
value of V is 0, then the expectation of the sum of the ui+Vi 
values is equal to the expectation of the m values.) The 

2 Security against semi-honest adversaries might be justified if the par- 
ties participating in the protocol are somewhat trusted, or if we trust the 
participating parties at the time they execute the protocol, but suspect that 
at a later time an adversary might corrupt them and get hold of the tran- 
script of the information received in the protocol. 

We note that protocols secure against malicious adversaries are con- 
siderably more costly than their semi-honest counterparts. For example, 
the generic method of obtaining security against malicious adversaries is 
through the GMW compiler 1 16 1 which adds a zero-knowledge proof for 
every step of the protocol. 



hope is that by adding random noise to the individual data 
points it is possible to hide the individual values. 

The random perturbation model is limited. It supports 
only addition operations, and it was shown in [13| that this 
approach can ensure very limited privacy guarantees. We 
only demonstrate this method as a lightweight protocol, 
mainly for comparing its running time with the other pro- 
tocols. 

3.2 Shamir's Secret Sharing (SSS) 

Secret sharing is a fundamental primitive of crypto- 
graphic protocols. We will describe the secret sharing 
scheme of Shamir [23 1. The scheme works over a field F, 
and we assume the secret s to be an element in that field. 
In a fc-out-of-n secret sharing the owner of secret wishes to 
distribute it between n players such that any subset of k of 
them is able to recover the secret, while no subset of k — 1 
players is able to learn any information about the secret. 

In order to distribute the secret, its owner chooses a 
random polynomial PQ of degree k — 1, subject to the 
constraint that P(0) = s. This is done by choosing ran- 
dom coefficients ai, . . . , a k _i and defining the polynomial 
as P{x) = s + J2i=i a i x ' '■ Each player is associated 
with an identity in the field (denoted x\, . . . , x n for play- 
ers 1, . . . , n, respectively). The share that player i receives 
is the value P{xi), namely the value of the polynomial eval- 
uated at the point Xi. It is easy to see that any k players can 
recover the secret, since they have k values of the polyno- 
mial and can therefore interpolate it and compute its free 
coefficient s. It is also not hard to see that any set of k — 1 
players does not learn any information about s, since any 
value of s has a probability of 1/\F\ of resulting in a poly- 
nomial which agrees with the values that the players have. 

3.3 Homomorphic encryption 

A homomorphic encryption scheme is an encryption 
scheme which allows certain algebraic operations to be car- 
ried out on the encrypted plaintext, by applying an efficient 
operation to the corresponding ciphertext (without knowing 
the decryption key!). In particular, we will be interested 
in additively homomorphic encryption schemes: Here, the 
message space is a ring (or a field). There exists an effi- 
cient algorithm + pk whose input is the public key of the 
encryption scheme and two ciphertexts, and whose output 
is E pk (mi) + pk E pk (m 2 ) = E pk (mi + m 2 ). (Namely, 
this algorithm computes, given the public key and two ci- 
phertexts, the encryption of the sum of the plaintexts of two 
ciphertexts.) There is also an efficient algorithm - pk , whose 
input consists of the public key of the encryption scheme, a 
ciphertext, and a constant c in the ring, and whose output is 
c - P k E pk (m) = E pk (c - pk m). 
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We will also require that the encryption scheme has se- 
mantic security. An efficient implementation of an addi- 
tive homomorphic encryption scheme with semantic secu- 
rity was given by Paillier l2Tll . In this cryptosystem the 
encryption of a plaintext from [1; N], where N is an RSA 
modulus, requires two exponentiations modulo N 2 . De- 
cryption requires a single exponentiation. We will use this 
encryption scheme in our work. 

3.3.1 Paillier encryption 

We describe in a nutshell the Paillier cryptosystem. Fuller 
details are found on [21 1. 

• Key generation Generate two large primes p and q. 
The secret key sk is A = lcm(p — l,q — 1). The 
public key pk includes N = pq and g £ Z N 2 such that 

g = 1 mod N. 

• Encryption Encrypt a message m G 'Ln with ran- 
domness r G 1** N 2 and public key pk as c = g m r N 
mod N 2 . 

• Decryption Decrypt a ciphertext c G Z,* N2 ■ Decryption 

is done using: ^° gX ZldN?) mod N where L ( x ) = 
(x - l)/N. 

4 Our construction 

The main observation we make is that numerous dis- 
tributed numerical methods compute in each node a 
weighted sum of scalars rriji, received from neighboring 
nodes, namely 

where the weight coefficients Oij are known constants. This 
simple building block, captures the behavior of multiple 
numerical methods. By showing ways to compute this 
weighted sum securely, our framework can support many 
of those numerical methods. In this section we introduce 
three possible approaches for performing the weighted sum 
computation. 

In Section 15.11 we give an example of the Jacobi algo- 
rithm which computes such a weighted sum on each itera- 
tion. 

4.1 Random perturbations 

In each iteration of the algorithm, whenever a node needs 
to send a value rriji to a neighboring node, the node gener- 
ates a random number r,-,, using the GMP library [1 1, from 
a probability distribution with zero mean. It then sends the 



value rriji + fj,i to the other node. As the number of neigh- 
bors increases, the computed noisy sum X^eN i m ji r J)») 
converges to the actual sum J^jeN- m ji- 

When the node computes a weighted sum of the mes- 
sages it received as in equation Q] it multiplies each incom- 
ing message by the corresponding weight. The computed 
noisy sum J2jeN- a ij( m ji + r j,i) converges to the actual 

We note again that this method is considered mainly for 
a comparison of its running time with that of the other meth- 
ods. 

4.2 Homomorphic Encryption 

We chose to utilize the Paillier encryption scheme, 
which is an efficient realization of an additive homomor- 
phic encryption scheme with semantic security. 

Key generation: We use the threshold version of 
the Paillier encryption scheme described in lfl5l . In this 
scheme, a trusted third party generates for each node i pri- 
vate and public key pairs0 The public key is disseminated 
to all of node i neighbors. The private key A^ = prvk(i) 
is kept secret from all nodes (including node i). Instead, 
it is split, using secret sharing, to the neighbors of node 
i. There is a threshold di, which is at most equal to |iVj|, 
the number of neighbors of node i. The scheme ensures 
that any subset of di of the neighbors of node i can help 
it decrypt messages (without the neighbors learning the 
decrypted message, or node i learning the private key). If 
di = \Ni\ then the private key is shared by giving each 
neighbor j a random value Sji subject to the constraint 
Sjew s ji = Aj = prvk(i). Otherwise, if di < \Ni\ the 
values Sji are shares of a Shamir secret sharing of A;. Note 
that fewer than di neighbors cannot recover the key. 

Using this method, all neighboring nodes of node i 
can send encrypted messages using pubk(i) to node i, 
while node i cannot decrypt any of these messages. It can, 
however, aggregate the messages using the homomorphic 
property and ask a coalition of di or more neighbors to help 
it in decrypting the sum. 

The initialization step of this protocol is as follows: 

HO The third party creates for node i a public and private 
key pair, [pubk(i),prvk(i)]. It sends the public key 
pubk(i) to all of node i's neighbors, and splits the pri- 
vate key into shares, such that each node i neighbors 
gets a share Sji. If di — \Ni\ then prvk{i) = Aj = 

3 It is also possible to generate the key in a distributed way, without 
using any trusted party. This option is less efficient. We show that even the 
usage of a centralized key generation process is not efficient enough, and 
therefore we did not implement the distributed version of this protocol. 
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J2j£Ni s ji- Otherwise the Sji values are Shamir shares 
of the private key. 

One round of computation: In each round of the algo- 
rithm, when a node j would like to send a scalar value rriji 
to node i it does the following: 

HI Encrypt the message rriji, using node i public key to 

get Cji = E pubk (i) (rriji). 

H2 Send the result Cji to node i. 

H3 Node i aggregates all the incoming message Cji, using 
the homomorphic property to get E pu bf./{) aijrriji ) 

After receiving all messages: Node i's neighbors assist it 
in decrypting the result a:,, without revealing the private key 
prvk(i). This is done as follows (for the case d, = |iVj|): 
Recall that in a Paillier decryption node i needs to raise the 
result computed in [H3] to the power of its private key A^. 

H4 Node i sends all its neighbors the result computed in 
[H3]: Ci = E pub m)(J2 aijrriji). 

H5 Each neighbor, computes a part of the decryption 
Wji = C i lz where Sji are node i private key shares 
computed in step [HO], and sends the result Wji to node 
i. 

H6 Node i multiplies all the received values to get: 

II,.. v = C^ 3eNi ' = C^' = ^2a ij m Jl mod N. 

(2) 

If di < \Ni \ then the reconstruction is done using Lagrange 
interpolation in the exponent, where node i needs to raise 
each Wji value by the corresponding Lagrange coefficient, 
and then multiply the results. 

Regarding message overhead, first we need to generate 
and disseminate public and private keys. This operation re- 
quires 2e messages, where e = \E\ is the number of graph 
edges. In each iteration we send the same number of mes- 
sage as in the original numerical algorithm. However, as- 
suming a security of I bits, and a working precision of d 
bits, we increase the size of the message by a factor of 4 . 
Finally, we add e messages for obtaining the private keys 
parts in step H4. 

Regarding computation overhead, for each sent message, 
we need to perform one Paillier encryption in step HI. In 
step H3 the destination node performs additional k — 1 mul- 
tiplications, and one decryption in step H4. At the key gen- 
eration phase, we add generation of n random polynomial 
and their evaluation. In step H4 we compute an extrapo- 
lation of those n polynomials. The security of the Paillier 
encryption is investigated in lETl [T5l . where it was shown 
that the system provides semantic security. 



4.3 Shamir Secret Sharing 

We propose a construction based on Shamir's secret 
sharing, which avoids the computation cost of asymmet- 
ric encryption. In a nutshell, we use the neighborhood of 
a node for adding a privacy preserving mechanism, where 
only a coalition of di or more nodes can reveal the content 
of messages sent to that node. 

In each round of the algorithm, when a node j would like 
to send a scalar value rriji to node i it does the following: 

51 Generate a random polynomial Pji of degree di — 1, 
of the type Pji(x) = rriji + Sit-i* a i x% ( wnere di < 

M). 

52 For each neighbor / of node i, create a share Cju of the 
polynomial Pji (x) by evaluating it on a single point xi . 

53 Send Cju to node I, which is i's neighbor. 

54 Each neighbor I of node i aggregates the shares it re- 
ceived from all neighbors of node i and computes the 
value Su — J2jeN- a ijPji{xi). (Note that the result 
of this computation is equal to the value of a polyno- 
mial of degree di — 1, whose free coefficient is equal 
to the weighted sum of all messages sent to node i by 
its neighbors.) 

55 Each neighbor I sends the sum Su to node i. 

56 Node i treats the value received from node I as a value 
of a polynomial of degree di — 1 evaluated at the point 

Xi. 

57 Node i interpolates Pi (x) for extracting the free coef- 
ficient, which in this case is the weighted sum of all 

messages J2jeN t a ij m ji- 

Note that the message rriji sent by node j remains hid- 
den if less than di neighbors of i collude to learn it (this 
is ensured since these neighbors learn strictly less than di 
values of a polynomial of degree di — 1). The protocol re- 
quires each node j to send messages to all other neighbors 
of each of its neighbors. We discuss the applicability of this 
requirement in Section|7] 

4.4 Extending the method to support 
multiplication 

Assume that node i needs to compute the multiplication 
of the values of two messages that it receives from nodes j 
and j'. The Shamir secret sharing scheme can be extended 
to support multiplication using the construction of Ben-Or, 
Goldwasser and Wigderson, whose details appear in [8|. 
This requires two changes to the basic protocol. First, the 
degree of the polynomials must be strictly less than |iVj|/2, 
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Figure 1. Schematic message flow in the pro- 
posed methods. The task of node i is to com- 
pute the sum of all messages: m ki + mji + m u 
(a) describes a message sent from j to i us- 
ing random perturbation, (b) describes steps 
[S3] in our SSS scheme, where the same 
message m t , is split into shares sent to all of 
i neighbors, (c) describes steps [S4] in our 
SSS scheme, where shares destined to i are 
aggregated by its neighbors, (d) shows steps 
[H6] in our SSS scheme, which is equivalent 
(in term of message flow) to step [H2] in our 
homomorphic scheme. 



where \Ni\ is the number of neighbors of the node receiv- 
ing the messages. (This means, in particular, that security is 
now only guaranteed as long as less than half of the neigh- 
bors collude.) In addition, the neighboring nodes must ex- 
change a single round of messages after receiving the mes- 
sages from nodes j and j'. We have not implemented this 
variant of the protocol. 

4.5 Working in different fields 

The operations that can be applied to secrets in the 
Shamir secret sharing scheme, or to encrypted values in 
a homomorphic encryption scheme, are defined in a finite 
field or ring over which the schemes are defined (for exam- 
ple, in the secret sharing case, over a field Z p where p is a 
prime number). The operations that we want to compute, 
however, might be defined over the Real numbers. Working 
in a field is sufficient for computing additions or multipli- 
cations of integers, if we know that the size of the field is 
larger than the maximum result of the operation. If the ba- 
sic elements we work with are Real numbers, we can round 
them first to the next integer, or, alternatively, first multi- 
ply them by some constant c (say, c = 10 6 ) and then round 
the result to the closest integer. (This essentially means that 
we work with accuracy of 1/c if the computation involves 
only additions, or an accuracy of l/c d if the computation 
involves summands composed of up to d multiplications.) 

Handling division is much harder, since we are essen- 
tially limited to working with integer numbers. One pos- 
sible workaround is possible if we know in advance that a 
number x might have to be divided by a different number 
from a set D (say, the numbers in the range [1, 100]). In 
that case we first multiply x by the least common multiple 
(lcm) of the numbers in D. This initial step ensures that di- 
viding the result by a number from D results in an integer 
number. 

5 Case Study: neighborhood based collabo- 
rative filtering 

To demonstrate the usefulness of our approach, we give 
a specific instance of a problem our framework can solve, 
preserving users' privacy. Our chosen example is in the field 
of collaborative filtering. We have chosen to implement 
the neighborhood based collaborative filtering algorithm, 
a state-of-the-art algorithm, winner of the Netflix progress 
prize of 2007. When adapting this algorithm to a Peer-to- 
Peer network, there are two main challenges: first, the al- 
gorithm is centralized, while we would like to distribute it, 
without losing accuracy of the computed result. Second, 
we would like to add a privacy preserving layer, which pre- 
vents the computing nodes from learning any information 
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about neighboring nodes or other nodes rating, except of 
the computed solution. 

We first describe the centralized version, and later we 
extend it to be computed in a Peer-to-Peer network. Given 
a possibly sparse user ratings matrix R mxn , where to is the 
number of users and n is the number of items, each user 
likes to compute an output ratings for all the items. 

In the neighborhood based approach [6|, the output rat- 
ing is computed using a weighted average of the neighbor- 
ing peers: 

Tut — ^ ^uj ■ 

Our goal is to find the weights matrix W where Wij signi- 
fies the weight node i assigns node j. 

We define the following least square minimization prob- 
lem for user i : 

min ^ ( r vi ~ ^2 

The optimal solution is formed by differentiation and so- 
lution of a linear systems of equations Rw = b. The opti- 
mal weights (for each user) are given by: 

w = (R T R)- 1 R T b (3) 

We would like to distribute the neighborhood based col- 
laborative filtering problem to be computed in a Peer-to- 
Peer network. Each peer has its own rating as input (the 
matching row of the matrix R) and the goal is to com- 
pute locally, using interaction with neighboring nodes, the 
weight matrix W, where each node has the matching row in 
this matrix. Furthermore, the peers would like to keep their 
input rating private, where no information is leaked during 
the computation to neighboring or other nodes. The peers 
will obtain only their matching output rating as a result of 
this computation. 

We propose a secure multi-party computation frame- 
work, to solve the collaborative filtering problem efficiently 
and distributively, preserving users' privacy. The computa- 
tion does not reveal any information about users' prior rat- 
ings, nor on the computed results. 

5.1 The Jacobi algorithm for solving sys- 
tems of linear equations 

In this section we give an example of one of the simplest 
iterative algorithms for solving systems of linear equations, 
the Jacobi algorithm. This will serve as an example for an 
algorithm our framework is able to compute, for solving the 
neighborhood based collaborative filtering problem. Note 
that there are numerous numerical methods we can compute 
securely using our framework, among them Gauss Seidel, 



EM (expectation minimization), Conjugate gradient, gradi- 
ent descent, Belief Propagation, Cholskey decomposition, 
principal component analysis, SVD etc. 

Given a system of linear equations Ax = b, where A is 
a matrix of size n x n, V^a^ ^ and b G M", the Jacobi 
algorithm |9| starts from an initial guess x°, and iterates: 

x\ = b iZ^MEi^£l (4) 

The Jacobi algorithm is easily distributed since initially 
each node selects an initial guess x®, and the values Xj 
are sent among neighbors. A sufficient condition for the 
algorithm convergence is when the spectral radius p(I — 
D~ 1 A) < 1, where / is the identity matrix and D = 
diag(A). This algorithm is known to work in asynchronous 
settings as well. In practice, when converging, the Jacobi 
algorithm convergence speed is logarithmic in 

Our goal is to compute a privacy-preserving version of 
the Jacobi algorithm, where the inputs of the nodes are pri- 
vate, and no information is leaked during the rounds of the 
computation. 

Note, that the Jacobi algorithm serves as an excellent ex- 
ample since its simple update rule contains all the basic op- 
eration we would like to support: addition, multiplication 
and substraction. Our framework supports all of those nu- 
merical operations, thus capturing numerous numerical al- 
gorithms. 

5.2 Using the Jacobi algorithm for solving 
the neighborhood based collaborative 
filtering problem 

First, we perform a distributed preconditioning of the 
matrix R. Each node i divides its input row of the matrix R 
by Ru. This simple operation is done to avoid the division 
in@] while not affecting the solution vector w. 

Second, since Jacobi algorithm's input is a square n x n 
matrix, and our rating matrix R is of size to x n, we use 
the following "trick": We construct a new symmetric data 
matrix R based on the non-rectangular rating matrix R s 

j^m X n 

( R K ) G K (m+ " )x(m+n) . (5) 

Additionally, we define a new vector of variables w = 
{w T ,z T } T e m(™+»)x\ where x e R mxl is the (to be 
shown) solution vector and z E M™ xl is an auxiliary hid- 
den vector, and a new observation vector b = {0 T , b T } T S 

j^(m+n) X 1 

4 Computing the pseudo inverse solution (equation 2) iteratively can be 
done more efficiently using newer algorithms, for example rill . For the 
purpose of the clarify of explanation, we use the Jacobi algorithm. 
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Now, we would like to show that solving the symmet- 
ric linear system Rw = b, taking the first m entries of the 
corresponding solution vector w is equivalent to solving the 
original system Rw = b. Note that in the new construc- 
tion the matrix R is still sparse, and has at most 2mn off- 
diagonal nonzero elements. Thus, when running the Jacobi 
algorithm we have at most 2mn messages per round. 

Writing explicitly the symmetric linear system's equa- 
tions, we get 

w + R T z = 0, Rw = b. 
By extracting w we have 

w = (R T R) " 1 R T b. 
the desired solution of equation[3] 

6 Experimental Results 

We have implemented our proposed framework using a 
large scale simulation. Our simulation is written in C, con- 
sists of about 1500 lines of code, and uses MPI, for running 
the simulation in parallel. We run the simulation on a cluster 
of Linux Pentium IV computers, 2.4Ghz, with 4GB RAM 
memory. We use the open source Paillier implementation 
of@. 

We use several large topologies for demonstrating the 
applicability of our approach. The DIMES dataset ll24l is 
an Internet router topology of around 300,000 routers and 
2.2 million communication links connecting them, captured 
in January 2007. The Blog network, is a social network, 
web crawl of Internet blogs of half a million blog sites 
and eleven million links connecting them. Finally, the Net- 
flix iff! movie ratings data, consists of around 500,000 users 
and 100,000,000 movie ratings. This last topology is a bi- 
partite graph with users at one side, and movies at the other. 
This topology is not a Peer-to-Peer network, but relevant for 
the collaborative filtering problem. We have artificially cre- 
ated a Peer-to-Peer network, where each user is a node, the 
movies are nodes as well, and edges are the ratings assigned 
to the movies. 



Topology Nodes Edges Data Source 

Blogs Web Crawl L5M 8M IBM 

DIMES 337,326 2,249,832 DIMES 

Netflix 497,759 100M Netflix 



the main overhead in implementing our proposed mecha- 
nisms is the computational overhead, since the communi- 
cation latency exists anyway in the underlying topology, 
and we compare the run of algorithms with and without the 
added privacy mechanisms overhead. For that purpose, we 
ignore the communication latency in our simulations. This 
can be justified, because in the random perturbations and 
homomorphic encryption schemes, we do not change the 
number of communication rounds, so the communication 
latency remains the same with or without the added privacy 
preserving mechanisms. In the SSS scheme, we double the 
number of communication rounds, so the incurred latency 
is doubled as well. 

Table 2 compares the running times of the basic oper- 
ations in the three schemes. Each operation was repeated 
100,000 times and an average is given. As expected the 
heaviest computation is the Paillier asymmetric encryption, 
with a security parameter of 2,048 bits. It can be easily ver- 
ified, that while the SSS basic operation takes around tens 
of microseconds, the Paillier basic operations takes frac- 
tions of seconds (except of the homomorphic multiplication 
which is quite efficient since it does not involve exponentia- 
tion). In a Peer-to-Peer network, when a peer has likely tens 
of connections, sending encrypted message to all of them 
will take several seconds. Furthermore, this time estimation 
assumes that the values sent by the function are scalars. In 
the vector case, the operation will be much slower. 

Table 3 outlines the running time needed to run 8 iter- 
ations of the Jacobi algorithm, on the different topologies. 
Four modes of operations are listed: no privacy preserv- 
ing means we run the algorithm without adding any privacy 
layer for baseline timing comparison. Next, our three pro- 
posed schemes are shown. 

In the Netflix dataset, we had to use eight computing 
nodes in parallel, because our simulation memory require- 
ment could not fit into one processor. 

As clearly shown in Table 3, our SSS scheme has sig- 
nificantly reduced computation overhead relative to the ho- 
momorphic encryption scheme, while having an equivalent 
level of security (assuming that the Paillier encryption is 
semantically secure). In a Peer-to-Peer network, with tens 
of neighbors, the homomorphic encryption scheme incurs a 
high overhead on the computing nodes. 

7 Conclusion and Future Work 

As is demonstrated by the experimental results section, 
we have shown that the secret sharing scheme has the lowest 
computation overhead relative to the other schemes. Fur- 
thermore, this scheme does not involve a trusted third party, 
as needed by the homomorphic encryption scheme for the 
threshold key generation phase. The size of the messages 
sent using this method is about the same as in the origi- 



Table 1. Topologies used for experimenta- 
tion 

We ignore algorithm accuracy since this problem was 
addressed in detail in (6). We are mainly concerned with 
the overheads of the privacy preserving mechanisms. Based 
on the experimental results shown below, we conclude that 
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Scheme 


Operation 


Time (micro second) 


Msg size (bytes) 


Random perturbation 


Adding noise 
Receiver operation 


0.0783745 


8 


SSS 


Polynomial generation and evaluation 
Polynomial extrapolation 


11.18382125 
6.13709025 


8 


Paillier 


Key generation 
Encryption 
Decryption 
Multiplication 


5016199.4 
203478.62 
193537.97 
99.063958 


2048 



Table 2. Running time of local operations. As expected, the Paillier cryptosystem basic operations 
are time consuming relative to the SSS scheme. 



Topology 


Scheme 


Time (HH:MM:SS) 


computing nodes 


DIMES 


None 

Random Perturbations 

SSS 

Paillier 


0:33.36 
0:35.27 
10:53.44 
28:44:24.00 




Blogs 


None 

Random Perturbations 

SSS 

Paillier 


1:28.16 
1:34.85 
38:00.24 
101:52:00.00 




Netflix 


None 

Random Perturbations 

SSS 

Paillier 


5:31.14 
5:54.69 
21:40.00 


8 
8 
8 



Table 3. Running time of eight iterations of the Jacobi algorithm. The baseline timing is compared to 
running without any privacy preserving mechanisms added. Empirical results show that computa- 
tion time of the homomorphic scheme is a factor of about 1,350 times slower then the SSS scheme. 



nal method, unlike the homomorphic encryption which sig- 
nificantly increases message sizes. However, the drawback 
of this scheme is that neighboring nodes to node i need to 
communicate directly between themselves (and each mes- 
sage sent to node i needs to be converted to messages sent 
to all its neighbors). In Peer-to-Peer systems with locality 
property it might be reasonable to assume that communi- 
cation between the neighbors of node i is possible. (There 
is a way to circumvent this requirement, by adding asym- 
metric encryption. Each node will have a public key, where 
message destined to this node are encrypted using its public 
key. That way if node j needs to send a message to node 
I, it can ask node i do deliver it, while ensuring that node i 
does not learn the content of the message. We identify this 
extension to our scheme as an area for future work. 

Another area of future work is the extension of our work 
to support malicious participants. The threshold Paillier 
cryptosystem supports verification keys lfT5l . that enable 
participants to verify validity of encrypted messages. Simi- 
larly, verifiable secret sharing schemes like ifTTl can be used 



to secure secret sharing against malicious participants, by 
verifying validity of polynomial shares. 

Regarding the operation in synchronous communication 
rounds, we have assumed, in order to simplify our exposi- 
tion, that the iterations of the peers are synchronized. How- 
ever, in practice it is not valid to assume that the clocks 
and message delays are synchronized in a large Peer-to-Peer 
network. Luckily, it is known that linear iterative algorithms 
such as the Jacobi algorithm converge in asynchronous set- 
tings as well (meaning that some peers might have made 
more iterations than other peers but the resulting computa- 
tion will still converge to the same optimal solution). 
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